POSEIDON

Ernesto

October 19, 2017

POSEIDON

What do we want

  • Policy Simulator
    • Agents Flexibility
    • Model Flexibility
  • Working Objective: isolate and understand policy effects and noise
  • Final Objective: to connect indicators with actions

How others do it ?

  • Random Utility Models
    • Statistically Efficient
    • Easily Generalizable
    • Policy-Brittle
  • Dynamic Programming
    • Strongly Rational
    • Computationally Expensive
    • Ad hoc

The Agent Problem

  • Find the most profitable spot to fish
  • Constraints:
    • No biomass information
    • No model knowledge
    • Environment changes over time
  • Subproblems:
    • How to explore
    • Explore-Exploit Tradeoff

Explore-Exploit-Imitate

# with probability epsilon, explore
if(runif()<epsilon)
  # shock your position by delta
  position <- position + runif(min=-delta,max=delta)
else
  # if a friend is doing better, play their slot machine
  if(profits[me] < profits[best_friend])
    position<- positions[best_friend]
  # otherwise play the previous slot machine
play(position)

Many Agents

Cui prodest?

  • Model free
  • Adaptive

Oil Prices

Fish the Line (part 1)

Fish the Line (part 2)

A flexible simulator

  • Flexible in terms of:
    • Decisions
    • Biology
    • Algorithms

Target Switching

Gear Selection

OSMOSE

WFS

Gravitational Search - Demo

Kernel Regression

Kernel Regression - Demo

Policies

Simulating Policies

  • Open Loop
    • Scenario Evaluation
    • Policy Optimization
  • Closed Loop
    • Policy Search

Open Loop

Scenario Evaluation

  1. You have adaptive agents
  2. Somebody hands you a set of policies to test
  3. Apply each in turn
  4. Check which performs best

TAC vs ITQ (mileage)

TAC vs ITQ (catchability)

Seventy-Thirty World

ITQ Prices

  • Quotas are distributed 90% reds, 10% blue

Blues are choke species

ITQ drives Gear (start)

ITQ drives Gear (end)

Gear fixes wastes

ITQ incentivates geography

Policy Optimization

  1. You have adaptive agents
  2. Somebody hands you a family of policies
  3. You want to find the “best” parameters

Optimal Quotas

\[ \text{Score} = \text{Blue Biomass}_{t=20} + \sum_{i=1}^{20} \text{Red Landings}_{t=i}\]

  • Geographically split map
  • 300 fishers
  • Very different quota values for TAC and ITQ

Optimal TAC

Optimal ITQ

Well-mixed world?

In a scenario where fishers are unable to respond to incentives the optimal quotas under TACs and ITQs are exactly the same

In a scenario where fishers are unable to respond to incentives the optimal quotas under TACs and ITQs are exactly the same

Pareto Front

Heterogeneous fleets

  • 2 kinds of boat:
    • Small boats
    • Large boats
  • 2 Objectives:
    • Maximize small boat income
    • Maximize efficiency
  • 1 Policy lever:
    • Build MPA

Fairness Front

Right-most solution

Left-most solution

Closed Loop

Bluemania

  • Well mixed world
  • Want to incentivate gear change through a landing tax
  • Blue fish worth 3 times red fish

No intervention

PID Taxation

  • Expensive (blue) stock gets consumed too rapidly
  • Geographically separated
  • Update tax smoothly such that every day only about 600 units of blue stock is landed daily
  • Poor man’s quotas
  • Use a PI controller \[ p_{t+1} = a e_t + b \sum_{i=0}^T e_{i} \] \[ e_t = \text{Landings} - 600 \]
  • “Autopilot” policy
  • Parameters matter
  • Noise matter

PID Taxation - demo

PID Taxation - optimal

RL Policy